Concurrent Discretization of Multiple Attributes
نویسندگان
چکیده
Better decision trees can be learnt by merging continuous values into intervals Merging of values however could introduce incon sistencies to the data or information loss When it is desired to maintain a certain consistency interval mergings in one attribute could disable those in another attribute This interaction raises the issue of determin ing the order of mergings We consider a globally greedy heuristic that selects the best merging from all continuous attributes at each step We present an implementation of the heuristic in which the best merging is determined in a time independent of the number of possible mergings Experiments show that intervals produced by the heuristic lead to im proved decision trees
منابع مشابه
Discretization Based on Entropy and Multiple Scanning
In this paper we present entropy driven methodology for discretization. Recently, the original entropy based discretization was enhanced by including two options of selecting the best numerical attribute. In one option, Dominant Attribute, an attribute with the smallest conditional entropy of the concept given the attribute is selected for discretization and then the best cut point is determine...
متن کاملDiscretization Numbers for Multiple-Instances Problem in Relational Database
Abstrak Handling numerical data stored in a relational database is different from handling those numerical data stored in a single table due to the multiple occurrences of an individual record in the non-target table and non-determinate relations between tables. Most traditional data mining methods only deal with a single table and discretize columns that contain continuous numbers into nominal...
متن کاملMulti-Interval Discretization of Continuous-Valued Attributes for Classification Learning
Since most real-world applications of classification learning involve continuous-valued attributes, properly addressing the discretization process is an important problem. This paper addresses the use of the entropy minimization heuristic for discretizing the range of a continuous-valued attribute into multiple intervals. We briefly present theoretical evidence for the appropriateness of this h...
متن کاملChi2: feature selection and discretization of numeric attributes
Discretization can turn numeric attributes into discrete ones. Feature selection can eliminate some irrelevant attributes. This paper describes Chi2, a simple and general algorithm that uses the 2 statistic to discretize numeric attributes repeatedly until some inconsistencies are found in the data, and achieves feature selection via discretization. The empirical results demonstrate that Chi2 i...
متن کاملAn Evolution Strategies Approach to the Simultaneous Discretization of Numeric Attributes
Many data mining and machine learning algorithms require databases in which objects are described by discrete attributes. However, it is very common that the attributes are in the ratio or interval scales. In order to apply these algorithms, the original attributes must be transformed into the nominal or ordinal scale via discretization. An appropriate transformation is crucial because of the l...
متن کامل